Module 9: Writing Functions

Jacob Jameson
Fall 2021

Functions

# example of a function
circle_area <- function(r) { 
  pi * r ^ 2
}
  • What are functions and why do we want to use them?
  • How do we write functions in practice?
  • What are some solutions to avoid frustrating code?

Motivation

“You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code)” - Hadley Wickham, R for Data Science

Instead of repeating code . . .

data %>%
mutate(a = (a - min(a)) / (max(a) - min(a)),
       b = (b - min(b)) / (max(b) - min(b)),
       c = (c - min(c)) / (max(c) - min(c)),
       d = (d - min(d)) / (max(d) - min(d)))

Write a function

rescale_01 <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

data %>%
  mutate(a = rescale_01(a),
         b = rescale_01(b), 
         c = rescale_01(c), 
         d = rescale_01(d))

Function anatomy

The anatomy of a function is as follows:

function_name <- function(arguments) { 
  do_this(arguments)
  }

A function consists of

  1. Function arguments
  2. Function body

We can assign the function to a name like any other object in R.

Function anatomy: example

  • arguments: x
  • body: (x - min(x)) / (max(x) - min(x))
  • assign to name: rescale_01
rescale_01 <- function(x) {
  (x - min(x)) / (max(x) - min(x))
  }

Note that we don’t need to explicitly call return()

  • the last line of the code will be the value returned by the function.

Writing a function: printing output

You start writing code to say Hello to all of your friends.

  • You notice it’s getting repetitive… . time for a function
print("Hello Kashif!")
[1] "Hello Kashif!"
print("Hello Zach!") 
[1] "Hello Zach!"
print("Hello Deniz!")
[1] "Hello Deniz!"
# and so on...

Writing a function: parameterize the code

Start with the body.

Ask: What part of the code is changing?

  • Make this an argument

Writing a function: parameterize the code

Start with the body.

Rewrite the code to accommodate the parameterization

# print("Hello Kashif!") becomes ...
name <- "Kashif" 
print(paste0("Hello ", name, "!"))
[1] "Hello Kashif!"

Check several potential inputs to avoid future headaches

Writing a function: add the structure

# name <- "Kashiif"
# print(paste0("Hello ", name, "!"))

function(name) {
print(paste0("Hello ", name, "!"))
  }
function(name) {
print(paste0("Hello ", name, "!"))
  }

Writing a function: assign to a name

Try to use names that actively tell the user what the code does

  • We recommend verb_thing()
  • good calc_size() or compare_prices()
  • badprices(), calc(), or fun1().
# name <- "Kashif"
# print(paste0("Hello ", name, "!"))

say_hello_to <- function(name) { 
  print(paste0("Hello ", name, "!"))
  }

Simple example: printing output

Test out different inputs!

say_hello_to('Kashif')
[1] "Hello Kashif!"
say_hello_to('Zach')
[1] "Hello Zach!"
say_hello_to('Deniz')
[1] "Hello Deniz!"
# Cool this function is vectorized!
say_hello_to(c("Jason", "Devina", "Andrew"))
[1] "Hello Jason!"  "Hello Devina!" "Hello Andrew!"

Question: does name exist in my R environment after I run this function? Why or why not?

Technical aside: typeof(your_function)

Like other R objects functions have types.

Primative functions are of type “builtin”

typeof(`+`)
[1] "builtin"
typeof(sum)
[1] "builtin"

Technical aside: typeof(your_function)

Like other R objects functions have types.

User defined functions, functions loaded with packages and many base R functions are type “closure”:

typeof(say_hello_to)
[1] "closure"
typeof(mean)
[1] "closure"

Technical aside: typeof(your_function)

This is background knowledge that might help you understand an error.

For example, you thought you assigned a number to the name “c” and want to calculate ratio.

ratio <- 1 / c

“Error in 1/c : non-numeric argument to binary operator”

as.integer(c)

“Error in as.integer© : cannot coerce type 'builtin' to vector of type 'integer'”

“builtin” or “closure” in this situation let you know your working with a function!

Second example: calculating the mean of a sample

Your stats prof asks you to simulate a central limit theorem, by calculating the mean of samples from the standard normal distribution with increasing sample sizes.

mean(rnorm(1))
[1] -1.289279
mean(rnorm(3))
[1] 0.7522052
mean(rnorm(30))
[1] 0.2555396

Second example: calculating the mean of a sample

The number is changing, so it becomes the argument.

calc_sample_mean <- function(sample_size) { 
  mean(rnorm(sample_size))
  }
  • The number is the sample size, so I call it sample_size. n would also be appropriate.

  • The body code is otherwise identical to the code you already wrote.

Second example: calculating the mean of a sample

For added clarity you can unnest your code and assign the intermediate results to meaningful names.

calc_sample_mean <- function(sample_size) { 
  random_sample <- rnorm(sample_size) 
  sample_mean <- mean(random_sample)

  return(sample_mean) 
  }

return() explicitly tells R what the function will return.

  • The last line of code run is returned by default.

Second example: calculating the mean of a sample

If the function can be fit on one line, then you can write it without the curly brackets like so:

calc_sample_mean <- function(n) mean(rnorm(n))

Some settings call for anonymous functions, where the function has no name.

function(n) mean(rnorm(n))
function(n) mean(rnorm(n))

Always test your code

Try to foresee the kind of input you expect to use.

calc_sample_mean(1)
[1] -0.8116221
calc_sample_mean(1000)
[1] 0.02135588

We see below that this function is not vectorized. We might hope to get 3 sample means out but only get 1.

# read ?rnorm to understand how rnorm 
# inteprets vector input. 
calc_sample_mean(c(1, 3, 30))
[1] 0.165544

How to deal with unvectorized functions

If we don’t want to change our function, but we want to use it to deal with vectors, then we have a couple options: Here we are going to use the function rowwise

#creating a vector to test our function
sample_tibble <- tibble(sample_sizes = c(1, 3, 10, 30))
#using rowwise groups the data by row, allowing calc_smple_mean
sample_tibble %>%
  rowwise() %>%
  mutate(sample_means = calc_sample_mean(sample_sizes))
# A tibble: 4 x 2
# Rowwise: 
  sample_sizes sample_means
         <dbl>        <dbl>
1            1      -0.712 
2            3      -0.727 
3           10      -0.182 
4           30       0.0264

Adding additional arguments

If we want to be able to adjust the details of how our function runs we can add arguments

  • typically, we put “data” arguments first
  • and then “detail” arguments after
calc_sample_mean <- function(sample_size, our_mean, our_sd) {
  sample <- rnorm(sample_size, 
                  mean = our_mean,
                  mean(sample),
                  sd = our_sd)
  }

Setting defaults

We usually set default values for “detail” arguments.

calc_sample_mean <- function(sample_size, 
                             our_mean=0, 
                             our_sd=1) {

  sample <- rnorm(sample_size, 
                  mean = our_mean,
                  sd = our_sd)
   mean(sample)
  }
# uses the defults
calc_sample_mean(sample_size = 10)
[1] -0.4997093

Setting defaults

# we can change one or two defaults.
# You can refer by name, or use position
calc_sample_mean(10, our_sd = 2)
[1] -0.3240156
calc_sample_mean(10, our_mean = 6)
[1] 5.634323
calc_sample_mean(10, 6, 2)
[1] 6.032516

Setting defaults

This won’t work though:

calc_sample_mean(our_mean = 5)

“Error in rnorm(sample_size, mean = our_mean, sd = our_sd) : argument "sample_size” is missing, with no default"

Key points

  • Write functions when you are using a set of operations repeatedly
  • Functions consist of arguments and a body and are usually assigned to a name.
  • Functions are for humans
    • pick names for the function and arguments that are clear and consistent
  • Debug your code as much as you can as you write it.
    • if you want to use your code with mutate() test the code with vectors

For more: See Functions Chapter in R for Data Science